\subsection{Introduction and Lagrange sufficiency}
Consider minimising \(f(\vb x)\) subject to \(\vb x \in \mathcal X\), \(h(\vb x) = \vb b\) where \(h \colon \mathbb R^n \to \mathbb R^m\).
The \textit{Lagrangian} associated with this problem is
\[
	L(\vb x, \bm \lambda) = f(\vb x) - \bm \lambda^\transpose(h(\vb x) - \vb b)
\]
where \(\bm\lambda \in \mathbb R^m\) is the vector of Lagrange multipliers.
We want to instead minimise \(L(\vb x, \bm \lambda), x \in \mathcal X\).
\begin{theorem}[Lagrange Sufficiency]
	Suppose we can find a \(\bm \lambda^\star\) such that
	\begin{enumerate}
		\item \(\min_{\vb x \in \mathcal X} L(\vb x, \bm \lambda^\star) = L(\vb x^\star, \bm\lambda^\star)\)
		\item \(\vb x^\star \in \mathcal X(\vb b) = \qty{\vb x \colon \vb x \in \mathcal X, h(\vb x) = \vb b}\)
	\end{enumerate}
	Then \(\vb x^\star\) is optimal for the original constrained problem, i.e.
	\[
		\min_{\vb x \in \mathcal X(\vb b)} f(\vb x) = f(\vb x^\star)
	\]
\end{theorem}
\begin{proof}
	First, note that condition (ii) states that \(f(\vb x^\star) \geq \min_{\vb x \in \mathcal X(\vb b)}f(\vb x)\), because \(\vb x^\star\) is feasible.
	Then,
	\begin{align*}
		\min_{\vb x \in \mathcal X(\vb b)}f(\vb x) & = \min_{\vb x \in \mathcal X(\vb b)} f(\vb x) - \underbrace{(\bm\lambda^\star)^\transpose (h(\vb x) - \vb b)}_{0 \text{ when } \vb x \in \mathcal X(\vb b)} \\
		                                           & \geq \min_{\vb x \in \mathcal X} f(\vb x) - (\bm\lambda^\star)^\transpose (h(\vb x) - \vb b)                                                                \\
		                                           & = \min_{\vb x \in \mathcal X} L(\vb x, \bm \lambda^\star)                                                                                                         \\
		                                           & = L(\vb x^\star, \bm \lambda^\star)                                                                                                                         \\
		                                           & = f(\vb x^\star) - (\bm\lambda^\star)^\transpose (h(\vb x^\star) - \vb b)                                                                                   \\
		                                           & = f(\vb x^\star)
	\end{align*}
\end{proof}

\begin{example}
	\begin{alignat*}{2}
		 & \underset{x \in \mathbb R^3}{\text{minimise}} & \qquad & -x_1 -x_2 + x_3     \\
		 & \text{subject to}                             &        & x_1^2 + x_2^2 = 4   \\
		 &                                               &        & x_1 + x_2 + x_3 = 1
	\end{alignat*}
	In this problem, we have
	\[
		h(\vb x) = \begin{pmatrix}
			x_1^2 + x_2^2 \\ x_1 + x_2 + x_3
		\end{pmatrix};\quad \vb b = \begin{pmatrix}
			4 \\ 1
		\end{pmatrix}
	\]
	Taking Lagrange multipliers, we have
	\begin{align*}
		L(\vb x, \bm\lambda) & = (-x_1 - x_2 + x_3) - \lambda_1 (x_1^2 + x_2^2 - 4) - \lambda_2(x_1+x_2 + x_3 - 1)                                               \\
		                     & = (-(1+\lambda_2) x_1 - \lambda_1 x_1^2) + (-(1 + \lambda_2)x_2 - \lambda_1 x_2^2) + (1 - \lambda_2)x_3 + 4 \lambda_1 + \lambda_2
	\end{align*}
	We want to fix a value of \(\bm \lambda\) and minimise \(L\), only considering solutions such that \(\vb x^\star\) is finite.
	Note that if \(\lambda_1 > 0\), then the first bracket can be made as small as we like by picking very small values of \(x_1\); this bracket would diverge to negative infinity so we cannot choose such a \(\lambda_1\).
	If \(\lambda_2 \neq 1\), the infimum is also negative infinity by considering the \(x_3\) term.
	So let us consider \(\lambda_1 \leq 0, \lambda_2 = 1\).
	Setting the derivative of the first term to zero, we have
	\begin{align*}
		\dv{x_1} (-(1+\lambda_2) x_1 - \lambda_1 x_1^2) & = -(1+\lambda_2) - 2 \lambda_1 x_1 = 0 \\
		\implies x_1                                    & = \frac{-1 - \lambda_2}{2\lambda_1}    \\
		                                                & = \frac{-2}{2\lambda_1}                \\
		                                                & = \frac{-1}{\lambda_1}
	\end{align*}
	Setting the derivative of the second term to zero,
	\begin{align*}
		\dv{x_1} (-(1+\lambda_2) x_2 - \lambda_1 x_2^2) & = -(1+\lambda_2) - 2 \lambda_1 x_2 = 0 \\
		\implies x_2                                    & = \frac{-1}{\lambda_1}
	\end{align*}
	We now want to choose \(\lambda_1\) such that \(x_1, x_2, x_3\) satisfy the constraints.
	\[
		x_1^2 + x_2^2 = 4 \implies x_1^2 = x_2^2 = 2 \implies x_1 = x_2 = \sqrt{2}
	\]
	Note that \(x_1, x_2 > 0\) since \(\lambda_1 \leq 0\), and correspondingly \(\lambda_1 = \frac{-1}{\sqrt{2}}\).
	Further, we can now find \(x_3 = 1 - 2\sqrt{2}\).
	This solution optimises the original problem.
\end{example}

\subsection{Using Lagrange multipliers in general}
Consider the problem
\begin{alignat*}{2}
	 & \underset{\vb x \in \mathcal X}{\text{minimise}} & \qquad & f(\vb x)            \\
	 & \text{subject to}                                &        & h(\vb x) \leq \vb b
\end{alignat*}
We can solve this problem using the following steps.
\begin{enumerate}[(1)]
	\item Add a slack variable \(\vb s\) to transform the problem to
	      \begin{alignat*}{2}
		       & \underset{\vb x \in \mathcal X}{\text{minimise}} & \qquad & f(\vb x)                 \\
		       & \text{subject to}                                &        & h(\vb x) + \vb s = \vb b \\
		       &                                                  &        & \vb s \geq 0
	      \end{alignat*}
	\item Calculate the Lagrangian,
	      \[
		      L(\vb x, \bm \lambda, \vb s) = f(\vb x) - \bm \lambda^\transpose (h(\vb x) + \vb s - \vb b)
	      \]
	\item Let
	      \[
		      \bm \Lambda = \qty{\bm \lambda \colon \inf_{\vb x \in \mathcal X;\; \vb s \geq 0} L(\vb x, \vb s, \bm \lambda) > -\infty}
	      \]
	\item For each \(\bm\lambda \in \bm\Lambda\), find \(\vb x^\star(\bm \lambda), \vb s^\star(\bm \lambda)\) such that
	      \[
		      \min_{\vb x \in \mathcal X;\; \vb s \geq 0} L(\vb x, \vb s, \bm \lambda) = L(\vb x^\star(\bm \lambda), \vb s^\star(\bm \lambda), \bm \lambda)
	      \]
	\item Find \(\bm\lambda^\star \in \bm\Lambda\) such that \((\vb x^\star(\bm \lambda), \vb s^\star(\bm \lambda))\) is feasible, i.e.
	      \[
		      h(\vb x^\star(\bm \lambda^\star)) = \vb b;\quad \vb s^\star (\bm \lambda^\star) \geq 0
	      \]
\end{enumerate}

\subsection{Complementary slackness}
In step (4) above, we want to minimise the Lagrangian, i.e.
\begin{alignat*}{2}
	 & \underset{\vb x \in \mathcal X}{\text{minimise}} & \qquad & f(\vb x) - \bm \lambda^\transpose (h(\vb x) - \vb b) - \bm \lambda^\transpose \vb s \\
	 & \text{subject to}                                &        & \vb s \geq 0
\end{alignat*}
Suppose, for a particular value of \(\bm \lambda\), that we solve this problem and arrive at \(\vb x^\star(\bm \lambda), \vb s^\star(\bm \lambda)\).
Let
\[
	\bm \lambda = \begin{pmatrix}
		\lambda_1 \\ \vdots \\ \lambda_m
	\end{pmatrix}
\]
If \(\lambda_i > 0\), then for some large \(\vb s\) we can make \(f \to -\infty\), hence \(\bm \lambda \notin \bm \Lambda\).
Hence, given \(\bm \lambda \in \bm \Lambda\), we must have \(\lambda_i \leq 0\).
Now, if \(\lambda_i < 0\) for some \(i\), we would want to choose \(s_i = 0\) to minimise the increase to the function caused by the slack variable.
If \(\lambda_i = 0\), then \(s_i\) can be chosen arbitrarily since it will have no increase on the value of \(f\).
With these choices of \(s_i\), we can make \(\bm \lambda^\transpose \vb s = 0\), thus making the slack variable not impact the value of \(f\).
So either
\begin{itemize}
	\item \(h(\vb x)_i = b_i\) and \(\lambda_i \leq 0\), or
	\item \(h(\vb x)_i \geq b_i\) and \(\lambda_i = 0\).
\end{itemize}
Alternatively (less precisely),
\[
	\lambda_i s_i = 0
\]
In other words, either the constraint inequality is tight (defined by an equality) and the Lagrange multipliers are slack (defined by an inequality), or the constraint inequality is slack and the Lagrange multipliers are tight.

\begin{example}
	\begin{alignat*}{2}
		 & \underset{\vb x \in \mathbb R^2}{\text{minimise}} & \qquad & x_1 - 3x_2           \\
		 & \text{subject to}                                 &        & x_1^2 + x_2^2 \leq 4 \\
		 &                                                   &        & x_1 + x_2 \leq 2
	\end{alignat*}
	Adding slack variables, we have

	\begin{alignat*}{2}
		 & \underset{\vb x \in \mathbb R^2}{\text{minimise}} & \qquad & x_1 - 3x_2              \\
		 & \text{subject to}                                 &        & x_1^2 + x_2^2 + s_1 = 4 \\
		 &                                                   &        & x_1 + x_2 + s_2 = 2     \\
		 &                                                   &        & s_1 \geq 0              \\
		 &                                                   &        & s_2 \geq 0
	\end{alignat*}
	Taking the Lagrangian,
	\begin{align*}
		L(\vb x, \vb s, \bm \lambda) & = (x_1 - 3x_2) - x_1(x_1^2 + x_2^2 + s_1 - 4) - \lambda_2(x_1 + x_2 + s_2 - 2)                                                                       \\
		                             & = \qty((1 - \lambda_2)x_1 - \lambda_1 x_1^2) + \qty((-3-\lambda_2)x_2 - \lambda_1 x_2^2) - \lambda_1 s_1 - \lambda_2 s_2 + (4\lambda_1 + 2\lambda_2)
	\end{align*}
	We must have \(\lambda_1, \lambda_2 \leq 0\) by considering the slack variable.
	By complementary slackness,
	\[
		\lambda_1 s_1 = \lambda_2 s_2 = 0 \text{ at the optimum}
	\]
	Minimising each term independently, we have
	\begin{align*}
		1 - \lambda_2 - 2\lambda_1 x_1 & = 0 \\
		-3-\lambda_2 - 2\lambda_1 x_2  & = 0
	\end{align*}
	If \(\lambda_1 = 0\), the above two equations are contradictory.
	Hence \(\lambda_1 < 0\), giving \(s_1 = 0\).
	If \(\lambda_2 < 0\), then \(s_2 = 0\) by complementary slackness, so
	\begin{align*}
		1 - \lambda_2 - 2\lambda_1 x_1 & = 0 \\
		-3-\lambda_2 - 2\lambda_1 x_2  & = 0 \\
		x_1^2 + x_2^2                  & = 4 \\
		x_1 + x_2                      & = 2
	\end{align*}
	Solving the lower two equations give
	\[
		(x_1, x_2) = (0, 2), (2, 0)
	\]
	If \((x_1, x_2) = (0, 2)\), solving the first two equations gives \((\lambda_1, \lambda_2) = (1, -3)\) which is impossible since \(\lambda_1\) must be negative.
	Similarly, if \((x_1, x_2) = (2, 0)\), solving the first two equations gives \((\lambda_1, \lambda_2) = (-1, 1)\) which is impossible again.
	We have ruled out every case apart from \(\lambda_1 < 0, \lambda_2 = 0\).
	In this case,
	\begin{align*}
		1 - 2\lambda_1 x_1  & = 0 \\
		-3 - 2\lambda_1 x_2 & = 0 \\
		x_1^2 + x_2^2       & = 4 \\
		x_1 + x_2 + s_2     & = 2
	\end{align*}
	The first two equations give
	\[
		x_1 = \frac{1}{2\lambda_1};\quad x_2 = \frac{-3}{2\lambda_1}
	\]
	Substituting into the third equation,
	\[
		\lambda_1^2 = \frac{5}{8} \implies \lambda_1 = -\sqrt{\frac{5}{8}}
	\]
	Hence,
	\[
		(x_1, x_2) = \qty(-\sqrt{\frac{2}{5}}, -3\sqrt{\frac{2}{5}})
	\]
	which is feasible using the fourth equation.
	By Lagrange sufficiency, this is the optimum for the original problem.
\end{example}

\subsection{Weak duality}
We would like to solve a problem
\begin{alignat*}{2}
	 & \underset{\vb x \in \mathcal X}{\text{minimise}} & \qquad & f(\vb x)         \\
	 & \text{subject to}                                &        & h(\vb x) = \vb b
\end{alignat*}
by constructing the Lagrangian
\[
	L(\vb x, \bm \lambda) = f(\vb x) - \bm\lambda^\transpose (h(\vb x) - \vb b)
\]
We now define the quantity
\[
	g(\bm \lambda) = \inf_{\vb x \in \mathcal X} L(\vb x, \bm \lambda)
\]
\begin{theorem}[Weak duality theorem]
	If \( \vb x \in \mathcal X(\vb b) \) and \( \bm\lambda \in \bm\Lambda \), then \( f(\vb x) \geq g(\bm \lambda) \).
	In particular,
	\[
		\inf_{\vb x \in \mathcal X(\vb b)} f(\vb x) \geq \sup_{\bm \lambda \in \bm \Lambda} g(\bm \lambda)
	\]
\end{theorem}
\begin{proof}
	\begin{align*}
		\inf_{\vb x \in \mathcal X(\vb b)} f(\vb x) & = \inf_{\vb x \in \mathcal X(\vb b)} f(\vb x) - \bm \lambda^\transpose (h(\vb x) - \vb b) \\
		                                            & \geq \inf_{\vb x \in \mathcal X} f(\vb x) - \bm \lambda^\transpose (h(\vb x) - \vb b)     \\
		                                            & = \inf_{\vb x \in \mathcal X} L(\vb x, \bm \lambda)                                       \\
		                                            & = g(\bm\lambda)
	\end{align*}
\end{proof}
Using this weak duality property, if \( \inf_{\vb x \in \mathcal X(\vb b)} f(\vb x) \) is difficult to solve, we can first attempt \( \sup_{\bm \lambda \in \bm \Lambda} g(\bm \lambda) \).
The problem
\begin{alignat*}{2}
	 & \text{maximise}   & \qquad & g(\bm \lambda)              \\
	 & \text{subject to} &        & \bm \lambda \in \bm \Lambda
\end{alignat*}
is called the \textit{dual problem}.
The original is called the \textit{primal problem}.
The optimal cost of the primal problem is always greater than or equal to the optimal cost of the dual problem.
The \textit{duality gap} is the difference:
\[
	\inf_{\vb x \in \mathcal X(\vb b)} f(\vb x) - \sup_{\bm \lambda \in \bm \Lambda} g(\bm \lambda)
\]
If the duality gap is zero, then we say that \textit{strong duality} holds.
This strengthens the inequality into an equality.

\subsection{Strong duality and the Lagrange method}
If the Lagrange method works, then we know that
\[
	\inf_{\vb x \in \mathcal X} L(\vb x, \bm \lambda) = \inf_{\vb x \in \mathcal X(\vb b)} L(\vb x, \bm \lambda)
\]
So, taking such a \( \bm \lambda \) in the proof above, we have equality instead of inequality.
Hence the problem has strong duality.
Conversely, if the duality gap is zero, then there exists a \( \bm \lambda \) such that the inequality above is an equality.
Hence, for this \( \bm \lambda \),
\[
	\inf_{\vb x \in \mathcal X} L(\vb x, \bm \lambda) = \inf_{\vb x \in \mathcal X(\vb b)} L(\vb x, \bm \lambda)
\]
Hence this is the \( \bm \lambda \) which will solve the Lagrange method.
In summary, strong duality holds exactly when the Lagrange method works.

\subsection{Hyperplane condition for strong duality}
\begin{definition}
	A function \( \phi \colon \mathbb R^m \to \mathbb R \) is said to have a \textit{supporting hyperplane} at a point \( \vb b \) if there exists \( \bm \lambda \in \mathbb R^m \) such that for all \( \vb c \in \mathbb R^m \),
	\[
		\phi(\vb c) \geq \phi(\vb b) + \bm\lambda^\transpose (\vb c - \vb b)
	\]
\end{definition}
Pictorially, \( \phi \) has a supporting hyperplane if there is a plane passing through \( (\vb b, \phi(\vb b)) \), where \( \phi \) is always above the plane.
This could be, for example, a tangent plane at \( \vb b \).

\begin{definition}
	We define a function \( \phi \colon \mathbb R^m \to \mathbb R \) associated with the primal problem by
	\[
		\phi(\vb c) = \inf_{\vb x \in \mathcal X(\vb c)} f(\vb x)
	\]
	This \( \phi \) can be thought of as the optimal cost of a family of optimisation problems with different functional constraint values \( \vb c \).
	This is called the \textit{value function}.
\end{definition}

\begin{theorem}[Strong duality theorem]
	Strong duality holds if and only if the value function \( \phi \) has a supporting hyperplane at \( \vb b \).
\end{theorem}
\begin{proof}
	First, we show that a supporting hyperplane implies strong duality.
	We have \( \bm \lambda \) such that
	\[
		\phi(\vb c) \geq \phi(\vb b) + \bm\lambda^\transpose (\vb c - \vb b)
	\]
	Then, we have
	\begin{align*}
		g(\bm \lambda) & = \inf_{\vb x \in \mathcal X} f(\vb x) - \bm\lambda^\transpose (h(\vb x) - \vb b)                                                                                                                           \\
		               & = \inf_{\vb c} \inf_{\vb x \in \mathcal X(\vb c)} f(\vb x) - \underbrace{\bm\lambda^\transpose (h(\vb x) - \vb c)}_{\mathclap{\text{zero since we are extremising}}} - \bm\lambda^\transpose(\vb c - \vb b) \\
		               & = \inf_{\vb c} \phi(\vb c) - \bm\lambda^\transpose(\vb c - \vb b)                                                                                                                                           \\
		               & \geq \phi(\vb b)
	\end{align*}
	By weak duality, we also have the reverse direction: \( g(\bm \lambda) \leq \phi(\vb b) \)
	Hence, \( g(\bm \lambda) = \phi(\vb b) \) and strong duality holds.
	Conversely, if strong duality holds, we want to show the existence of such a hyperplane.
	We have We have \( \bm \lambda \) such that \( g(\bm \lambda) = \phi(\vb b) \).
	For such a \( \bm \lambda \), we have
	\begin{align*}
		\phi(\vb b) = g(\bm \lambda) & = \inf_{\vb x \in \mathcal X} f(\vb x) - \bm\lambda^\transpose (h(\vb x) - \vb b)                                        \\
		                             & = \inf_{\vb x \in \mathcal X} f(\vb x) - \bm\lambda^\transpose (h(\vb x) - \vb c) - \bm\lambda^\transpose(\vb c - \vb b) \\
		                             & \leq \phi(\vb c) - \bm\lambda^\transpose(\vb c - \vb b)
	\end{align*}
	The last inequality holds due to weak duality.
	So \( \bm \lambda \) gives a supporting hyperplane.
\end{proof}

\subsection{Strong duality and convex functions}
We would now like to consider for which problems \( \phi(\vb b) \) has a supporting hyperplane.
The following theorem is stated without proof.
\begin{theorem}
	A function \( \phi \colon \mathbb R^m \to \mathbb R \) is convex if and only if every point \( \vb b \in \mathbb R^m \) has a supporting hyperplane.
\end{theorem}
Now, for which problems do we have a convex value function?
\begin{theorem}
	Consider a minimisation problem
	\begin{alignat*}{2}
		 & \underset{\vb x \in \mathcal X}{\text{minimise}} & \qquad & f(\vb x)            \\
		 & \text{subject to}                                &        & h(\vb x) \leq \vb b
	\end{alignat*}
	with value function \( \phi \).
	Then \( \phi \) is convex if:
	\begin{enumerate}
		\item \( \mathcal X \) is convex;
		\item \( f \) is convex;
		\item \( h \) is convex.
	\end{enumerate}
\end{theorem}
This is proven in the example sheets.

\subsection{Shadow prices interpretation of Lagrange multipliers}
Suppose a factory owner produces \( n \) types of products from \( m \) types of raw materials.
Suppose the owner produces \( \vb x = (x_1, x_2, \dots, x_n) \) products, then the profit is some function \( f(\vb x) \).
We then create \( h_j(\vb x) \) to be the amount of raw material \( j \) consumed when making products \( \vb x \).
The owner wants to maximise \( f(\vb x) \) subject to \( h_i(\vb x) \leq b_i \) where \( b_i \) is the maximum amount of raw material \( i \) that is available.

Now, suppose a supplier offers some \( \bm\varepsilon = (\varepsilon_1, \varepsilon_2, \dots, \varepsilon_m) \) extra raw materials to the factory owner.
We would like to calculate how much this \( \bm\varepsilon \) is worth.
The factory owner will try to maximise this new problem, replacing \( \vb b \mapsto \vb b + \bm\varepsilon \).
For a small enough \( \bm\varepsilon \), this can be expressed easily using the value function.

\[
	\phi(\vb b + \bm\varepsilon) - \phi(\vb b) \approx \sum_{j=1}^m \pdv{\phi}{b_j}(\vb b) \varepsilon_j
\]
The quantity \( \pdv{\phi}{b_j}(\vb b) \) is the price of material \( j \), and \( \grad \phi(\vb b) \) is the vector of prices.
These are called the `shadow prices'; they are hidden to the outside world but depend on the internal state of the factory.

\begin{theorem}
	If \( \phi \) is differentiable at \( \vb b \) and has a supporting hyperplane given by \( \bm \lambda \), then
	\[
		\bm \lambda = \grad{\phi}(\vb b)
	\]
\end{theorem}
\begin{proof}
	Let \( \vb a = (a_1, a_2, \dots, a_m) \) be an arbitrary vector.
	Then from the supporting hyperplane condition, for some small \( \delta > 0 \) we have
	\[
		\frac{\phi(\vb b + \delta\vb a)}{\delta} \geq \bm \lambda^\transpose \vb a
	\]
	Since \( \phi \) is differentiable, the limit can be taken to give
	\[
		\grad{\phi}(\vb b) \cdot \vb a \geq \bm \lambda^\transpose \vb a
	\]
	But \( \vb a \) was arbitrary.
	This can only hold if \( \bm \lambda = \grad{\phi}(\vb b) \) as required.
	So the Lagrange multiplier \( \bm \lambda \) at \( \vb b \) is equal to the gradient vector of \( \phi \) which is the gradient of partial derivatives and also the vector of shadow prices.
\end{proof}

Suppose that a particular raw material was not used up.
Then there is a slack value in the inequality.
The shadow price is zero in this instance, since we do not need more of this material.
So the corresponding Lagrange multiplier is equal to zero.
Conversely, if we are paying something for this material, then we must have used up all of that material.
This is exactly the complementary slackness property seen earlier.

There is also an economics interpretation of the dual problem.
Such a problem can be seen from the perspective of the raw material seller.
This seller charges a certain price \( \bm \lambda \) for their raw materials, and then buys the finished product from the factory.
The profit of the raw material seller is
\[
	\underbrace{\bm\lambda^\transpose (h(\vb x) - \vb b)}_{\text{cost of materials}} - \underbrace{f(\vb x)}_{\text{buying products}}
\]
For every choice of \( \bm\lambda \), the factory owner will try to maximise their profit, that is, find an \( \vb x^\star \) such that we maximise
\[
	\underbrace{f(\vb x)}_{\text{selling products}} - \underbrace{\bm\lambda^\transpose (h(\vb x) - \vb b)}_{\text{cost of materials}}
\]
